Project description

You have server logs with data on Yandex.Afisha visits from June 2017 through May 2018 Dump file with all orders for the period Marketing expenses statistics

You are going to study: How people use the product When they start to buy How much money each customer brings When they pay off

Instructions for completing the project Step 1. Download the data and prepare it for analysis Store the data on visits, orders, and expenses in variables. Optimize the data for analysis. Make sure each column contains the correct data type. File paths: /datasets/visits_log_us.csv. Download dataset /datasets/orders_log_us.csv. Download dataset /datasets/costs_us.csv. Download dataset

Step 2. Make reports and calculate metrics: Product How many people use it every day, week, and month? How many sessions are there per day? (One user might have more than one session.) What is the length of each session? How often do users come back? Sales When do people start buying? (In KPI analysis, we're usually interested in knowing the time that elapses between registration and conversion — when the user becomes a customer. For example, if registration and the first purchase occur on the same day, the user might fall into category Conversion 0d. If the first purchase happens the next day, it will be Conversion 1d. You can use any approach that lets you compare the conversions of different cohorts, so that you can determine which cohort, or marketing channel, is most effective.)

How many orders do they make during a given period of time? What is the average purchase size? How much money do they bring? (LTV) Marketing How much money was spent? Overall/per source/over time How much did customer acquisition from each of the sources cost? How worthwhile where the investments? (ROI) Plot graphs to display how these metrics differ for various devices and ad sources and how they change in time.

Step 3. Write a conclusion: advise marketing experts how much money to invest and where. What sources/platforms would you recommend? Back up your choice: what metrics did you focus on? Why? What conclusions did you draw after finding the metric values? Format: Complete the task in Jupyter Notebook. Enter the code in code cells and text explanations in markdown cells. Apply formatting and headings. Description of the data

Dataframes were checked, columns name were fixed to lower case and removed spaces, dates which were outside of the study dates were removed from all dataframes, zero revenue rows were removed from the orders dataframe, and there are no duplicates.

Product

How many people use it every day, week, and month?

We can see that desktop user engagment peaked at December 2017 but then started to decline quite rapidly, maybe a change in the website caused this? We can see the touch users engament also declined at the same time.

We can see that desktop devices are more popular among users.

How many sessions are there per day?

We can see that the number of session for desktop devices is more then double of that of touch users. We can also see that there was a day on November where users sessions peaked, we can look if there was an offer on that date.

What is the length of each session?

We can see that the user average session length for desktop users is almost double then touch users. We can also see that desktop users short sessions are almost triple then the touch users. Maybe the desktop website is more likly to result in conversions?

How often do users come back?

We will see the retention rate by week

Retention rate per week is in decline sharply after the first week in all weeks. It will also be helpful to see monthly activity of users

We can also see in the monthly breakdown a big decline in users coming back in all months. The June cohort maintianed the best performance, as the retention rate was the highest then all other cohorts.

In conclusion for the product section we learned that there was a peak in user engagment in December but since then user engagment is going down, especially for the desktop users. We are also seeing higher session times for desktop users, maybe the mobile version shoud be improved. We can also see that the retention rates are low, maybe a discount or member program can make users stay on the platform fro longer?

Sales

When do people start buying?

I will calculate the number of days passed between first activity date and first purchase date per user, and will create a new column 'category' to show the number of days.

Now I can subtract the columns to find the mean number of days between registration and first purchase

The median number of days between user registration and purchase is 1 day. There is a small precentage of negative purchase date which might be an error in the logs, further investigation will be needed. Desktop number of purchases is 4 time larger then the touch category.

How many orders do they make during a given period of time?

to check the amount of purchases, I will count the number of purchases and categorize it by month cohort.

We can see that October, November and December are the strongs months in terms of number of purchases. July and April in contrast are the weakest. In June the number of purchases was double then the number of users at that month, which is intersting and maybe corrolated to the stronger retention rate in this month we found above.

What is the average purchase size?

We can see the average purchase size for each cohort. September was where the largest average purchase size was reached at close to 9, whereas February was the lowest.

How much money do they bring? (LTV)

The total amount of money a customer brings to the company on average by making purchases. When calculating LTV, analysts usually consider the gross profit from a customer per period. As it's not mentioned if the profit is gross or not, I am assuming it is.

We see clearly for which cohorts revenue per user increased in the months following the month of first purchase. in all cohorts the revenue increased, but the June and September are the strongest which correlate with the number of purchases per user we found above.

In conclusion for sales part, we checked and see that the conversion average is 1 day, and that Desktop number of purchases is 4 time larger then the touch category. We also seen that October, November and December are the strongest months in terms of number of purchases per user. September also was where the largest average purchase size was reached. We also seen that LTV increased in each monthly cohorts and that June and September has the best performance.

Marketing

How much money was spent? Overall/per source/over time

We can see that source_id 3 has the largest amount of money spent on where November was the largest month.

How much did customer acquisition from each of the sources cost?

How worthwhile where the investments? (ROI)

We can see that source 3 has the lowest romi so it can be recommended that this source will be dropped out and the money invested in other sources ids.

Conclusion

In conclusion I have inspected how many people use the products every day, week, and month; how many sessions are there per day; and what is the length of each session per device.

In the product section we learned that there was a peak in user engagment in December but since then user engagment is going down, especially for the desktop users. We are also seeing higher session times for desktop users, maybe the mobile version should be improved. We can also see that the retention rates are low, maybe a discount or member program can make users stay on the platform fro longer? After the user first use the product, he usually don't come back.

In sales part, we checked and see that the conversion average is 1 day, and that Desktop number of purchases is 4 time larger then the touch category. We also seen that October, November and December are the strongest months in terms of number of purchases per user. September also was where the largest average purchase size was reached. We also seen that LTV increased in each monthly cohorts and that June and September has the best performance.

We also saw that source id 3 is the most expensive source id, and has the lowest romi, my recommendation is to avoid investing in this source id and distribute the funds between the remaining source ids.

Finally we saw the that Septmber cohort of romi is the strongst overall while June cohort is the second strongest, and January is the weakest cohort month in terms of romi.